Guard SDK Library
dynamofl.guard
methods enhance safety and guardrails around models. This page divides the SDK methods into the following sections:
alignment
functions that assist in aligning models using preference data such as DPO and RLHF.helper
functions that assist in a variety of safety and guardrail use cases.
Installation
Please follow the installation steps in the page titled "Installing ML SDK Libraries."
from dynamofl.guard import *
Release Notes
Please see release notes for dynamofl.guard
in the release notes section in the documentation sidebar.
Alignment Methods
Method generate_prompts_relevant_to_policy()
Returns
List of prompts suitable to test or train LLMs on a specified custom alignment policy.
Writes prompts using both diverse
and in-domain
algorithmic processes.
Parameters
- policy(
str
): Alignment guideline to adhere to. - domain(
str
): Domain of the language model / topics it should answer.- E.g. "General chatbot for finance customers."
- example_prompts (
list[str]
): list of example prompts - num_to_generate(
int
, default=8): Number of prompts to generate.- Maximum 2048.
- diverse_ratio(
float
, default=0.5): Ratio of prompts that are diverse- default = 0.5; 50% of prompts will be diverse prompts; 50% will be in domain.
- example_subtopics(
list[str]
, default=[], optional): list of example subtopics used to generate in domain prompts.- Note: this is optional. By default, an input is an empty list and the algorithm will automatically generate subtopics (which it will print out for review if verbose is set to True).
- verbose(
bool
, default=False): Print statistics.
Method formulate_critic_revision_prompts()
Returns
A tuple (critic_prompt
, revision_prompt
):
critic_prompt
is a string used to prompt for critiquing a response. e.g. Discuss whether the response contained anything that may be interpreted as offensive or harmful. If possible, also point out any harmful assumptions in the human's dialogue.revision_prompt
is a string used to prompt for revising a response. e.g. Please write a new version of the response that removes offensive and harmful content, and politely addresses any problematic assumptions from the human.
Parameters
- policy(
str
): Alignment guideline to adhere to. - verbose(
bool
, default=False): Print statistics.
Method write_better_responses_critique()
This improves base responses for compliance with a policy.
Returns
Responses to prompts as a list of dictionaries:
[
{"prompt":, "rejected":, "chosen":, "rejected_critique":,}
]
This is Dynamo's adaptive CAI approach.
- The base responses are set as the "rejected" response.
- An LLM is asked to critique the base response.
- An LLM is asked to generate a better "chosen" response based on the critique.
Parameters:
- policy(
str
): Alignment guideline to adhere to. - critic_prompt(
str
): Prompt used by automated red-teaming model to judge responses. - revision_prompt(
str
): Prompt used by automated red-teaming model to improve upon responses. - prompts_responses_list(
list[dict]
) must be a list of dicts in the following format:
[
{"prompt":, "response":}
]
- filter_good_responses(
bool
, default=False): throw away datapoints where the model is already doing well- default: False
- If True, we first critique the "responses" of prompts_responses_list and throw it out if it's already rated as the top score.
- verbose(
bool
, default=False): Print statistics.
Helper Methods
Method generate_data_api()
Returns
JSON output from LLM APIs.
NOTE: If your input is a single string, this will return a string.
If your input is a list of strings, this will return a list of strings.
Note: by default, enforce_output_key
expects the LLM output to be in the format of
{
"generated": <output>
}
This method will return the <output>
.
It will ignore leading and trailing text before and after the curly braces.
Use environ vars to select the model and input your API key / endpoint.
# 1. Specify your API model
Options: ["mistral-tiny", "mistral-small-latest", "mistral-medium-latest", "mistral-large-latest",
"gpt-4", "gpt-3.5-turbo", "claude-3-opus-20240229", "claude-3-sonnet-20240229",
"custom"]
Note: "custom" refers to any callable model API endpoint e.g. Databricks' serving of Mistral.
os.environ['DYNAMO_DATA_GENERATION_MODEL'] = "custom"
# 2. Specify your API key
os.environ['DYNAMO_DATA_GENERATION_API_KEY'] = 'dapi[EXAMPLE]d0' # databricks
# 3. If DYNAMO_DATA_GENERATION_MODEL == 'custom', then you must specify endpoint and api key
os.environ['DYNAMO_DATA_GENERATION_ENDPOINT'] = 'https://dbc-[EXAMPLE].cloud.databricks.com/serving-endpoints/databricks-mixtral-8x7b-instruct/invocations'
Parameters
- prompt(Union[
str
, List[str
]]): input to mistral. Prompt must tell model to output a JSON with one key (and one key only): "generated" If prompt is a list, then the model will generate a list of outputs. Else, the model will output a single string. - temperature(
float
): used during generation; must be between 0.0 and 1.0 - model(
str
, default='mistral-small-latest'): model to call. mistral-medium-latest is the better model, equivalent to or better than GPT3.5 mistral-small-latest is Mixtral-8x7b, which is faster than mistral-medium-latest gpt-4 will call the latest default gpt-4 model on OpenAI's api - verbose(
bool
, default=False): print the output - endpoint(
str
, default=None): endpoint used for any custom API via requests - api_key(
str
, default=None): Mistral API key By default, we are using DynamoFL org's api token designed for external demos. - required_characters(
list[str]
, default=[]): list of characters to check in the output For example, passing in ['[', ']'] will raise a custom error if the output does not contain both '[' and ']'. It will prompt the model that it is missing this character and ask the model to fix its output. - enforce_dictionary(
bool
, default=True): enforce model output to be a dictionary Cuts off all characters before the first ' and after the last '. Raises error and reprompts model if output does not contain a valid dictionary. - enforce_output_key(
str
, default="generated): enforce JSON output to have a certain key Automatically returns the value of this key in the JSON. If None, then return the entire JSON. - max_tokens(
int
, default=512): max number of tokens to generate. Note: mistral API does not support max tokens at the moment. Note: OAI has a maximum of 4000 for max_tokens + len(prompt).
Method find_similar_strings()
Returns
List of indices. Each index represents the element of a string that is > similarity_threshold of another string in the list.
- Note: uses an LLM to generate embeddings and cosine similarity.
- Note: uses sentence_transformers Parameters
- string_list(
list[str]
): list of strings to compare for similarity - similarity_threshold(
float
, default=0.75): threshold for cosine similarity - model_id(
str
, default='all-MiniLM-L12-v2'): model to use to compute embeddings- Options: all-MiniLM-L12-v2, all-MiniLM-L6-v2
- verbose(
bool
, default=False): print example of sentences that were similar
Method jsonl_to_csv()
Returns
Writes contents of jsonl into a csv file.
Parameters
- jsonl_file_path(
str
): jsonl location to copy contents from. - csv_file_path(
str
): csv location to paste contents into.
Method csv_to_jsonl()
Returns
Writes contents of csv into a jsonl file.
Parameters
- csv_file_path(
str
): csv location to copy contents from. - jsonl_file_path(
str
): jsonl location to paste contents into.